AITopics | deep linear residual network

Collaborating Authors

deep linear residual network

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Global Convergence of Gradient Descent for Deep Linear Residual Networks

Neural Information Processing SystemsDec-25-2025, 01:22:21 GMT

We analyze the global convergence of gradient descent for deep linear residual networks by proposing a new initialization: zero-asymmetric (ZAS) initialization. It is motivated by avoiding stable manifolds of saddle points. We prove that under the ZAS initialization, for an arbitrary target matrix, gradient descent converges to an $\varepsilon$-optimal point in $O\left( L^3 \log(1/\varepsilon) \right)$ iterations, which scales polynomially with the network depth $L$. Our result and the $\exp(\Omega(L))$ convergence time for the standard initialization (Xavier or near-identity) \cite{shamir2018exponential} together demonstrate the importance of the residual structure and the initialization in the optimization for deep linear neural networks, especially when $L$ is large.

global convergence, gradient descent, initialization, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Reviews: Global Convergence of Gradient Descent for Deep Linear Residual Networks

Neural Information Processing SystemsJan-21-2025, 21:00:32 GMT

Response to authors' feedback: I thank the authors for the rebuttal. My score remains the same. With this initialization, the networks are shown to converge linearly to zero loss, under conditions (for discrete-time GD) that are different from and perhaps conceptually simpler than previous works. For instance, compared to reference [2] (Arora et al "A convergence analysis of gradient descent for deep linear neural networks", ICLR 2019), this work removes completely the delta-balanced condition in [2] by showing that this condition actually holds, for most layers, on the GD trajectory (Lemma 4.2 and Eq. While certain elements have already been seen in previous works (e.g. the property in Lemma 4.2 is similar to the delta-balanced condition in [2], or the requirement of zero initialization for the last layer's weight has been seen in "fixup initialization" of reference [21] in the context of residual networks), I think the proposed initialization as well as the convergence analysis here deserve credits for novelty.

deep linear residual network, global convergence, initialization, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.62)

Add feedback

Reviews: Global Convergence of Gradient Descent for Deep Linear Residual Networks

Neural Information Processing SystemsJan-21-2025, 21:00:22 GMT

The reviewers appreciated the work on the initialization even if they deemed it incremental. The experiments on the nonlinear network in the rebuttal was useful and I encourage the authors to expand the experimental section using more realistic setups to show how the theory matters in practice.

artificial intelligence, deep linear residual network, machine learning, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.40)

Add feedback

Global Convergence of Gradient Descent for Deep Linear Residual Networks

Neural Information Processing SystemsOct-9-2024, 14:01:47 GMT

We analyze the global convergence of gradient descent for deep linear residual networks by proposing a new initialization: zero-asymmetric (ZAS) initialization. It is motivated by avoiding stable manifolds of saddle points. We prove that under the ZAS initialization, for an arbitrary target matrix, gradient descent converges to an \varepsilon -optimal point in O\left( L 3 \log(1/\varepsilon) \right) iterations, which scales polynomially with the network depth L . Our result and the \exp(\Omega(L)) convergence time for the standard initialization (Xavier or near-identity) \cite{shamir2018exponential} together demonstrate the importance of the residual structure and the initialization in the optimization for deep linear neural networks, especially when L is large.

deep linear residual network, global convergence, initialization, (1 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.95)

Add feedback

Global Convergence of Gradient Descent for Deep Linear Residual Networks

Wu, Lei, Wang, Qingcan, Ma, Chao

Neural Information Processing SystemsMar-19-2020, 02:03:33 GMT

We analyze the global convergence of gradient descent for deep linear residual networks by proposing a new initialization: zero-asymmetric (ZAS) initialization. It is motivated by avoiding stable manifolds of saddle points. We prove that under the ZAS initialization, for an arbitrary target matrix, gradient descent converges to an $\varepsilon$-optimal point in $O\left( L 3 \log(1/\varepsilon) \right)$ iterations, which scales polynomially with the network depth $L$. Our result and the $\exp(\Omega(L))$ convergence time for the standard initialization (Xavier or near-identity) \cite{shamir2018exponential} together demonstrate the importance of the residual structure and the initialization in the optimization for deep linear neural networks, especially when $L$ is large. Papers published at the Neural Information Processing Systems Conference.

deep linear residual network, global convergence, initialization, (2 more...)

Neural Information Processing Systems

Genre: Research Report (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.94)

Add feedback